1 Data preparation

1.1 Metadata

##           Samples.Var1 Samples.Freq
## 1      3PGMouseControl            3
## 2                Blank            2
## 3              Control           60
## 4         LRRK2_R1441C           60
## 5 SingleCellLysateCtrl            3

1.2 Transcript abundances

## Samples are in the correct order
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 
## summarizing abundance
## summarizing counts
## summarizing length
  • We used kallisto (version kallisto_linux-v0.43.0) to create a reference index and estimate transcript abundances.

  • We downloaded the human reference transcriptome from Ensembl (release 95) including cdna and ncrna sequences:

ftp://ftp.ensembl.org/pub/release-95/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz ftp://ftp.ensembl.org/pub/release-95/fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh38.ncrna.fa.gz

  • We removed sequences in scaffolds and kept only chromosomes 1, 2, …, 22, X and Y.

  • We have included ERCC sequences when creating the reference index (ERCC.fa.gz)

  • All annotations from transcript and genes are in the following file: “Annotation_Homo_sapiens.GRCh38.cdna.ncrna.rmCHR.txt”.

  • Features with zero expression across all samples have been removed.

  • Gene-level TPM (transcripts per million) estimates represent the overall transcriptional output of each gene

  • We filtered out 2 sets: one including all gene biotypes All_biotypes, and the other including only protein coding genes, plus control features (ribo.genes, mito.genes, ERCCs) PCMRE

2 Transcript types

2.1 Counts per gene biotype

  • Sum of the raw counts per gene biotype

2.2 Log10 counts per biotype

  • Sum of the raw counts per gene biotype.

  • The sums are then log transformed log10(counts + 1).

3 Single cell gene expression object

## Warning in .local(object, ...): using library sizes as size factors
  • Gene expression data: 128 cells and 17641 features

4 Plate layout

4.1 Plate 1

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

4.2 Plate 2

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

5 QC features histograms

  • Project 128

  • Genome 128

  • patient_ID 60 60

  • sex 120

  • reprogramming 120

  • WellLocation 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1

  • is_cell_control 120 8

  • is_cell_control_control 120 8

6 Library complexity

7 Mean expression vs. percentage of expressing cells

8 Top expressed features

9 PCA

  • PCA on the top 4000 most variable features, including all samples.
## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

10 Filtering

10.1 Cells

  • Clustering on PC1 to remove low quality cells.

  • We use kmeans clustering with the Euclidean distance of PC1 (k = 2).

  • Also removed: control cells (SingleCellLysateCtrl, 3PGMouseControl and Blank) and cells with an outlier number of total features (total_features_by_counts)

## Warning: 'add_ticks' is deprecated.
## Use '+ geom_rug(...)' instead.
## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

## 
##      3PGMouseControl                Blank              Control 
##                    3                    1                   41 
##         LRRK2_R1441C SingleCellLysateCtrl 
##                   32                    3

10.2 Features

  • Filter out lowly expressed genes. Keep genes expressed in at least 20% of the 72 possible cells (i.e. non zero expression in at least 14.4 cells).

  • We also remove ERCC controls at this point.

10.3 Cells features after filtering

  • We end up with 72 cells and 6789

  • The average number of features per cell 3605.4861111

11 Dopaminergic markers

  • We look for the expression of the following dopamineric markers: TH, DDC, SLC6A3, SLC18A2, DRD2, SLC18A2, LMX1A, LMX1B, FOXA2, NR4A2, PITX3, EN1, EN2.

  • Only the ones below were kept after filtering.

## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

## ENSG00000180176.14 ENSG00000132437.17 ENSG00000165646.13 
##                 33                 33                 20 
## ENSG00000162761.14 ENSG00000153234.13 
##                 19                 16

12 Glutamatergic markers

  • In addition of dopaminergic markers, here we show glutamatergic markers: CTIP2 / BCL11B, KA1 / GRIK4, NMDAR1 / GRIN1, OTX1, TBR1.

  • We plot them along with dopaminergic markers, and using all features and cells before filtering (otherwise only TBR1 remains)

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

13 LRRK2 expression

  • Gene expression from all features and cells before filtering.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

14 Cell specific markers

  • Gene expression from all features and cells before filtering.
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

15 Re-calculate PC

15.1 Scater PCs

  • Principal component analysis based on top 1000 variable genes

15.1.1 PCs by Group

15.1.2 Explanatory variables by PC

## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'Type' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'Project' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'Genome' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'sex' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'reprogramming' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'is_cell_control' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'is_cell_control_control' with fewer than 2 unique
## levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'pct_counts_in_top_50_features_ERCCs' with fewer
## than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'pct_counts_in_top_100_features_ERCCs' with fewer
## than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'pct_counts_in_top_200_features_ERCCs' with fewer
## than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'pct_counts_in_top_500_features_ERCCs' with fewer
## than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'PC1_k2_cluster' with fewer than 2 unique levels
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

15.1.3 Corrgram

15.1.4 First 3 PC by metadata

## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

15.1.5 Selected components

  • That broadly separate the data by group.
## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

15.2 PC all features

  • Principal component analysis based on top ALL variable genes

15.2.1 PCs by Group

15.2.2 Explanatory variables by PC

## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'Type' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'Project' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'Genome' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'sex' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'reprogramming' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'is_cell_control' with fewer than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'is_cell_control_control' with fewer than 2 unique
## levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'pct_counts_in_top_50_features_ERCCs' with fewer
## than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'pct_counts_in_top_100_features_ERCCs' with fewer
## than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'pct_counts_in_top_200_features_ERCCs' with fewer
## than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'pct_counts_in_top_500_features_ERCCs' with fewer
## than 2 unique levels
## Warning in getVarianceExplained(dummy, variables = variables, exprs_values
## = "pc_space", : ignoring 'PC1_k2_cluster' with fewer than 2 unique levels
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

15.2.3 Corrgram

  • pct_counts_in_top_50_features_Ribo, pct_counts_ERCCs, pct_counts_in_top_500_features_endogenous, total_features_by_counts_endogenous, log10_total_counts_ERCCs, total_features_by_counts, Plate

15.2.4 First 5 PC by metadata

## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

16 Counts by sequencing depth

  • Check out relationship between expressed counts and reads / counts and sequencing depth.